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Abstract 

One is interested here in the observation of dynamic processes starting from the 
traces which they leave or those that one makes them produce. It is considered 
here that it should be possible to make several observations simultaneously, using 
a large variety of independently developed analyzers. 

For this purpose, we introduce the original notion of "full trace" to capture 
the idea that a process can be instrumented in such a way that it may broadcast 
all information which could ever be requested by any kind of observer. Each 
analyzer can then find in the full trace the data elements which it needs. This 
approach uses what has been called a "tracer driver" which completes the tracer 
and drives it to answer the requests of the analyzers. 

A tracer driver allows to restrict the flow of information and makes this 
approach tractable. On the other side, the potential size of a full trace seems to 
make the idea of full trace unrealistic. 

In this work we explore the consequences of this notion in term of potential 
efficiency, by analyzing the respective workloads between the (full) tracer and 
many different analyzers, all being likely run in true parallel environments. 

To illustrate this study, we use the example of the observation of the reso- 
lution of constraints systems (proof-tree, search-tree and propagation) using so- 
phisticated visualization tools, as developed in the project OADymPPaC (2001- 
2004). 

The processes considered here are computer programs, but we believe the 
approach can be extended to many other kinds of processes. 

1 Introduction 



One is interested here in the observation of dynamic processes starting from the 
traces which they leave or those that one makes them produce. It is considered 
here that it should be possible to make several observations simultaneously, using 
a large variety of independently developed analyzers. 

When one wants to observe a process, the practice is to instrument it for 
each type of observation which one wants to make on it. One thus implements a 
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new "ad hoc" tracer for each analyzer, or one adapts and completes an existing 
one. Such a work can be largely avoided if one adopts from the start a general 
approach which consists to instrument it such that it produces what we call a 
"full trace". This unique trace will be useful then for all the later observations 
which one can plan to make. Each analyzer can then find in the full trace the 
data elements which it needs. This approach uses what was called a "tracer 
driver" which completes the tracer and drives it to answer the requests of the 
analyzers. 

This approach is particularly tempting in practice as the full trace never needs 
to be completely expressed (the exchanges of information remain limited) and 
the work of the tracer implementation and of its driver is made once for all. The 
evaluation in terms of feasibility and performance remains however problematic. 

In fact, this approach allows to reduce the size of the trace emitted to a 
useful bare minimum and thus to speed up the whole process. In compensation, 
it allows to consider very large full traces. This has also a cost which grows with 
the size of the trace. Beyond a certain size, the production cost of the trace is 
likely to become prohibitory. It is precisely the question in which one is interested 
here. Until where can one go in this approach on a practical level without slowing 
down a process excessively and how? To get a more precise idea, one must take 
into account not only the time of trace production, but also how the trace will 
be used. 

In [2] and [3] the notion of tracer driver is presented and experimented is the 
context of finite domain constraint resolution [4] . But the question of the nature 
of the trace whose emission is controlled by the tracer driver is not directly tack- 
led. In this work we explore the consequences of this notion in term of potential 
efficiency, by analyzing the respective workloads between the (full) tracer and 
many different analyzers, all being likely run in a true parallel environment. 

In this work, we introduce the notion of "full trace" to capture the idea that a 
process can be instrumented in such a way that it may broadcast all information 
which could ever be requested by any kind of observer. We then analyze the 
nature of the work of the tracer and the driver, and the distribution of the 
functions between the tracer and its driver on the one hand, and the analyzers 
on the other hand. This allows us to better estimate how powerful, useful and 
efficient the concept of full trace can be, provided it is accompanied by the right 
architecture of all involved components. 

To illustrate this study, one will take the example of the observation of the 
resolution of constraints systems (proof-tree, search-tree and propagation) us- 
ing sophisticated tools for visualization, according to the method developed in 
the projects DiSCiPl (1997-2000) [5] then OADymPPaC (2001-2004) p]. This 
field is of particular interest because the traces include the representations of 
complicated and potentially bulky objects, and the computations (evolution of 
the domain of the variables) are at the same time logical and stochastic. The 
constraints systems, because of the complexity of their resolution, are very close 
to true complex systems. 



In this extended abstract we present successively the concepts of full trace 
and its incremental "compressed" version, then the question of their semantics. 
We analyze finally the problem of the distribution of work between a driven 
tracer and external analyzers which work only with the useful trace flow which 
is provided to them by their requests. 

2 Full Trace 

We introduce here the concept of full trace. By definition the full trace of a 
process contains all one may like to know about it during its execution (this 
includes likely a description of the process itself) . A process is characterized 
at a given moment f by a state St- It does not enter in the framework of this 
article to define what is exactly such a state. It will be supposed only that it 
can be described by a finite set of parameters p n for the nth parameter, and p n .t 
for its value at moment t. The concept of "moment" will be specified hereafter. 
A current state will be denoted by the list of the values of its parameters. It is 
also assumed that the transformation of a state into another is made by "steps" 
characterized by an action a. The set of actions performed at moment t is labelled 

Of 

So defined, the concept of full trace seems not to have any application. In 
practice, there are only approximations; but the important thing here, is to admit 
that, whatever is the level of details with which one wishes to observe a process, 
there is always a threshold which makes it possible to define such a trace. One 
can consider that in the case of a program, it acts of a more or less thorough 
instrumentation which produces a trace, in other words a program augmented 
with a tracer. 



Definition 1 (Virtual Full Trace). . 

sequence of trace events of the form et 
elements: 



An virtual full trace is an unbounded 
: (t,at, St+i) comprising the following 



— et : unique identifier of the event. 

— t: chrono. Time of the trace. It only varies by unit values and is always 
increasing. To distinguish from the time of the observed processes or of the 
analyzers which may not be monotonous compared to the chrono. 

— St = Pi,t---,Pn.t: parameters at chrono t. In a trace event the parameters 
are called attributes and St the full current state. The parameters may 
describe objects or actions performed to reach the new state. St is the last 
"observed" state before the event et occurs. The parameters (or attributes) 
correspond to the new reached state St+i- 

— at-' action an identifier of the set of actions characterizing the step from the 
state St to the state St+i. 

Any trace effectively produced by a process can be regarded as a partial full 
trace. In practice, one "sees" only partial traces which start after the moment 
where the process observed is in a presumably initial state So- We will limit 



ourselves in this summary to only a single example of trace: the trace of the 
proof-trees in Prolog systems based on standard Prolog [6]. This trace is far 
from being a full trace (many useful information, even easily available at any 
moment are not represented there). Here is an extract: 
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This is a Byrd's trace [7], adopted in the majority of the Prolog systems. The 
moment "0" corresponds to the launching of the resolution of a goal. Each trace 
event corresponds to a stage in its resolution. An event contains the following 
information: two attributes which are indication of depths (the first, the depth 
in the search-tree and the second in the proof-tree), a "port" which corresponds 
to the action which made it possible to reach this stage. Thus the port call 
corresponds to a call to the indicated subgoal and the installation of the sub- 
goals to solve, and exit to the success of the subgoal. There are other ports (a 
total of 5 in the case of GNU-Prolog) not detailed here. The first "call" defines 
the launching of the first goals (if there is only one of them) . If the resolution 
terminates, the trace is finite, unbounded otherwise. The last attribute gives 
the subgoal to solve. This trace does not comprise identifiers nor chrono; the 
chrono corresponds to the sequential order of emission of the trace events and 
the resulting chrono plays the role of an identifier. 

It is interesting to note that the objective of such a trace is to display all the 
steps of evolution of a proof-tree until obtaining all the possible proofs (the port 
fail corresponds to a failure and redo to a nondeterministic goal where a new 
resolution will be tried), and to some extend to describe also the search-tree. 
However, the (partial) proof-tree is never explicitly described (not more than 
the search-tree). This trace thus does not provide the parameters of interest di- 
rectly (the partial proof-tree). This brings us to the following observation. The 
parameters of the full state are not given explicitly in this trace, but only some 
attributes (port and goal) which possibly make it possible to find it. We will 
reconsider this point later. These attributes thus give only an incremental infor- 
mation which makes it possible to obtain the new proof-tree after a resolution 
step. We will say that the trace is "incremental". This leads us to define the 
following particular traces which we define here without more details. 

Definition 2 (Discontinuous, Effective, Incremental Full Trace). . 

A trace is discontinuous if it is a (full) trace which contains successive 
events whose variation of the chrono may be higher than a unit. There are "holes" 
in it, either because the emission is discontinuous, or because the observing pro- 
cess "listens " only occasionally to it. Notice that the discontinuity concerns only 



the trace emission and not its extraction; in other words moment corresponds 
to the activation of the tracer. 

A trace is effective if it is a (full) trace of the form e t : (t, A t ) and such that, 
starting from the knowledge of(St,A t ) one can deduce (at,St+i). A t denotes a 
set of attributes. The effective trace is the trace emitted by a tracer, that which 
is actually "visible". The virtual full trace is a particular case of effective trace 
where the attributes A t are the action label at and the parameters St. Another 
particular case is the full "incremental" trace. 

A trace is incremental if the attributes are such that only the changes affect- 
ing the current state are noted. It has the form et : (t, Deltat+) where Deltat+ 
contains the description of the actions which modify the values of the parameters 
of the moment t. To remain a full trace, this trace must satisfy the following con- 
dition: starting from the knowledge of (St, Deltat+) one can deduce (at,St+i). 

In this extended summary, when the distinction is not absolutely necessary, 
one will not distinguish between virtual and effective traces, and one will speak 
indifferently about parameters or attributes. 

Practically all the traces are incremental (thus it uses attributes), as the 
preceding example illustrates it well, because the emission of a full state in each 
trace event would be obviously prohibitive. In fact the size which would take the 
events would be much too high and the trace would be extremely redundant. 
The condition imposes simply that one can retrieve the full trace starting from 
the transmitted attributes and from the preceding full state. In practice the 
observed processes will produce partial traces only. In this case the retrieval of 
the full state is impossible. If one wants to find a full trace, or at least a more 
complete one, it will be necessary "to ask" the tracer to provide at least one full 
state. If the observing process needs to take into account only a partial state, 
that can be sufficient to enable it to maintain a consistent partial state. On the 
other hand if it needs, at a given moment, to know a full state, or at least a 
more complete state, it will have "to ask" the tracer to provide him at least a 
full current state or a part of it. 

Practically all the traces are discontinuous, even if often each part of them 
(necessarily finite) can be regarded as a finite single trace. This also raises the 
problem of the knowledge of the initial state Sq in which the observed process 
was at the moment of the initial trace event (chrono equals to 0) and thus of the 
communication to the analyzers of the full initial state Sq , before any event of 
trace. 

These two reasons justify that one is interested in the manner of obtaining 
such a state. The current value of a parameter may exist just as it is in the process 
and requires only a small computation to extract it. On the other hand it may 
not exist and should require a partial re-execution of the process (this capacity 
is used in the analyzers of the CHIP environment of Cosytec [5] or CLPGUI 
[1]). This obliges to stop the execution of the observed process (at least when 
that is possible), and to give the ability to the observing process to stop and 
resume the observed process. This leads to the idea sketched in the introduction 
of synchronization primitives between processes, a kind "image freezing" which 



makes it possible to complete the information at any moment, according to the 
need. This also results in the need for a function "access to the current state". 

It is important to note that with an effective or incremental trace, keeping 
a full current state is then in charge of the observing process and not of the 
observed process. This assumes however that the observed process contains at 
least recovery points in which full current state will be maintained and accessible, 
so as to allow the observing process to resume the trace and to be able to restart 
from a full current state. It is important to note that with a full effective trace, 
different from the full virtual trace, to keep a full current state in the observing 
process will be an additional charge of this process itself, because the tracer of 
the observed process then does not have any more obligation to calculate the 
requested parameters explicitly (only the requested attributes are computed). 
This supposes however that the observed process contains at least "recovery 
points" in which a full current state will be preserved and accessible. This may 
allow the observing process to resume the trace and to be able to restart from a 
full current state. This aspect is not treated here. 

3 Observational Semantics 

One is interested in this section in the semantics of the full trace, called here 
"observational semantics". 

The trace does not explain anything. It is only a collection of facts. The 
question arises however to understand the trace. For that one needs two levels of 
semantics. The first level corresponds to the description of the actions and ob- 
jects appearing in the parameters and attributes of the traces, i.e. a semantics of 
the observed data. The second level corresponds to a kind of trace "explanation", 
i.e. a semantics describing how the values of the parameters at the moment t 
are derived from the values of the parameters of the moment t — 1 and how the 
actions a t are selected. Clearly this means that one has a model of the process 
which accurately describes the evolution of the trace between two events. The 
form of this semantics is the subject of another work in progress. 

"To read" a trace, one needs only the first level of semantics. Only the relations 
between the parameters in the same state or attributes in the emitted trace 
events must be known. In the example of the Prolog trace, the properties of the 
object "proof-tree" must be known to understand the relation between the depth 
of the tree and the tree itself. The semantics of the trace (full and/or incremental) 
thus contains the description of the relations between the parameters which 
relates the semantics and the produced attributes. But to understand the trace 
comprehensively, it is necessary to go further in the knowledge of the process 
itself. 

What we call here "observational semantics" (OS) is the semantics of the 
tracer (the first semantics, although being part of the OS, can be seen like a 
"semantics of the trace"). It uses the semantics of the objects and actions of the 
full trace and can be seen as an abstract semantics in the sense of Cousot [8]. It 
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Fig. 1. A rule of the abstract model of Prolog Proof- Tree displayed in [13] 



is a kind of natural semantics [9] which can be expressed by a finite set of rules 
of the form 

a : C ' ondition(S) —* S' = a(S) 

such that any event of trace et : (t, at, St+i) can be obtained starting from 
the state St and the action at by application of the rule a of the OS. The (set 
of) actions a performed at chrono t is denoted at- This semantics can take the 
form of a structural operational semantics [10] or of an evolving algebra [H] 
and being more or less refined, "big-step" or "small-step" [12]. It is in fact very 
tempting to have a complete semantics for the tracer in order to allow a clear 
implementation of it. 

Coming back to the example of Prolog, a tracer augmented with the trace 
of constraints resolution (proof-tree, search-tree, labeling-tree and propagation), 
called Codeine, has been implemented in this manner on GNU-Prolog [13] . An 
abstract model has been defined [H], which has been then implemented on sev- 
eral solvers [lj. This made it possible to meet two principal objectives: portability 
of the analyzers whose input data are based on the full trace, and robustness of 
the tracers whose implementation is guided and improved by a good method- 
ological approach [15] based on a pre-specified rigorous semantics. 

It is useful however to note that a complete semantics of the trace, which 
would be a complete formalization of the observed process, is almost impossible 
in practice, because of the degree of refinement that would imply. For example, 
an attribute included in many traces is the CPU time consumed by the process 
since the "beginning" of the trace emission. To formalize its variations, in fact 
issued from the host system, would amount introducing into the observational 
semantics a model of the system in which the process is executed. 

Finally an OS should not be confused with a complete operational semantics 
which would make it possible to recall the course of the process starting from 
the sole initial state and its rules. The current full state is not sufficient to 
know which rule may apply. For example, the model developed for finite domain 
constraints resolution described in [13] contains a set of operational rules such 
as at least one rule can be applied at each state, but whose conditions are not 
always precise enough to decide how it can be applied. So is the back to rule, 
depicted in the Figure [TJ 

It contains a condition I £ dom(£) which must be satisfied, meaning that 
the new current node belongs to the search-tree, but nothing says which node 
must be selected. The sole knowledge of the full current state (which includes the 
current search-tree) is not sufficient. The knowledge of the current trace event 
is necessary to know which is this new node and thus to know how this rule is 
in fact instantiated. 



4 Tracer Driver and Evaluation 

The idea to use a tracer driver was proposed by M. Ducasse and J. Noye since 
the origin of Prolog [16] in the context of logic programming, then regularly de- 
veloped and tested in various environments [17| • The originality of this approach 
consists in proposing that the full trace should not be communicated in its in- 
tegrity to an analyzer, but that this one receives only the part of the trace which 
relates to it. Instead of broadcasting "urbi et orbi" a full trace that any analyzer 
would have the task to filter, filtering is performed at the source level, i.e. by 
the observed process which behaves then like a server of traces. The analyzer 
is a client which restricts himself to indicate to the observed process the trace 
which it needs indeed. The complete separation of the observed and observing 
processes then brings to consider an architecture server/client with exchanges of 
information: the client indicates to the server the trace it needs and the server 
provides him only what it requires. The filtering of the trace is no longer carried 
out by the analyzer, but by the observed process. It is the task of the tracer 
driver to perform all requested filtering and to dispatch the traces requested by 
the clients. This architecture as well as the exchanges between the processes have 
been described in particular in [lj and [2]. We will not go more into the details in 
this extended abstract. In addition, only the aspects related to the possibilities 
of modulating the traces emitted according to the needs of the analyzers will 
be considered; the possibility previously mentioned to synchronize the processes 
will not be considered here. 

This approach allows to reduce the size of the trace emitted to a useful bare 
minimum and thus to speed up the whole process. In compensation, it allows 
to consider very large full traces. This has also a cost which grows with the size 
of the trace. Beyond a certain size, the production cost of the trace is likely to 
become prohibitory. It is precisely the question in which one is interested. Until 
where can one go into this approach on a practical level without slowing down 
a process excessively and how? To get a more precise idea, one must take into 
account not only the time of trace production, but also how the trace will be 
used. 

One could think at first sight that the fact, for the observed process, to have to 
produce a full trace makes this approach unrealistic. The server is indeed slowed 
down by the simple fact of having to compute a great number of parameters 
which will perhaps never be used. This approach would be thus very penalizing 
for the process instrumented with an expensive tracer whenever only a weak 
portion of the full trace would be used. On another side, and precisely in this 
case, a considerable economy is realized because of filtering at the source, and 
the transmission of a limited trace whose costs of coding and diffusion are then 
extremely reduced. The idea is that in practice the fact of emitting only one 
small part of the trace, filtered at the source, compensates for the over cost of 
work mainly related to the updates of all the parameters of the full trace in the 
analyzed process. 



On the other hand, if many analyzers are activated simultaneously and use 
at a given moment a trace equivalent to a full trace, the problem to produce or 
not the full trace does not arise any more (it must in any case be produced) , and 
the question about the interest of using a tracer driver is worth to be posed. We 
show that even in this case the driver can save time. 

Additionally another type of economy must be considered which is comple- 
mentary to the previous one: the use of an incremental trace as an effective trace 
instead of the virtual full trace. In this case there is no loss of information (the 
equivalent of the full trace is still transmitted) , but there is a reduction of the 
amount of data transmitted (a kind of "data compression" is used in order to 
limit the size of the emitted data flow) . In this later case the tracer performs a 
kind of "compression" and the analyzer a kind of "decompression" and both task 
must be taken into account in the analysis of the workloads. It will be shown 
that the performances of the tracer and the analyzers can be influenced more by 
this kind of load rather than by the size of the full trace. 

In order to be more precise, it is necessary to analyze the repartition of work 
within the various processes. 

The times to take into account for a detailed performances analysis are times 
concerning the process itself, its tracer and its pilot on the one hand, and times 
concerning one analyzer on the other hand (one must suppose here that, if there 
are several analyzers, they are running with true parallelism): 

Process Tracer and Driver 



T_prog + T_core + T_cond + T_extract + T_encode-and-com 

Analyser 



T_filter + T_decode + T_rebuild + T_exec 

From the side of the process, tracer and driver: 

— T prog : time devoted to the execution of the not instrumented program (or 
instrumented but with deactivated tracer). 

- T core : additional execution time of the process once instrumented to produce 
the full trace and with activated tracer. It is a time devoted to the construc- 
tion of the elements necessary to a likely later extraction of the parameters of 
the current state. This time is largely influenced by the size of the full trace. 
At this stage all computations must be performed because, in the presented 
approach, one considers that it must be possible to produce the full current 
state at any moment (in case in particular of discontinuous trace) . This time 
is related to the form of the full trace only and does not depend on what 
will be emitted. If all the parameters of the full trace are already part of the 
process, this time is just null. 



— T conc i: time of checking of the conditions denning the traces to be emitted 
for each analyzer (filtering) . This time is null if there is no filtering (emission 
of the full trace) . 

— T extr act- computing time of the parameters requested during or after filtering. 

— 2 encode- and- com- time for formatting the trace (encoding), possible compres- 
sion and emission. 

Time specific to the driver corresponds to T cond . Other times, namely, T core , 
Textract and T enco( i e _ ana <_ com can be regarded as times related to the tracer. 

It should be noticed here that, in this approach, the driver acts only on 
the choice of the parameters and the attributes to be contained in the effective 
trace, i.e. on the communicated information. It does not have the possibility of 
influencing the form of the attributes, for example the degree of "compression" 
of the information. In fact, this "compression" is coded here in the form of the 
attributes. The nature of the attributes (incremental information or not) is part 
of the tracer and it cannot be modified or adapted through the tracer driver. 
On the other side any additional compression algorithm used to reduce the size 
of the information flow belongs to the stage of "encoding". Generalizing this 
idea, one could also consider the possibility to put into the trace more abstract 
attributes, adapted to some more specific use, such that the whole trace has 
a reduced size. The corresponding attribute computation time would thus be 
related to the "extraction" stage. This generalization is not studied here. 

From the side of the analyzer: 

— T futer' time of filtering by an analyzer. This time is null if filtering is per- 
formed at the source (the trace events sent to a particular analyzer can 
indeed be tagged at the source) . In fact by precaution but mainly because of 
lack of implemented tracer driver, many external analyzers will filter again 
the trace. However it is not necessary to consider this case here. 

— Tdecode- time of decoding of the received trace. This time is impossible to 
circumvent, as the time of coding and communication I encode- and- com as 
well. It must be compared with the encode part of T encode- and- com- It can- 
not be completely eliminated. If compression/decompression algorithms are 
used, it is because it is considered that, even if both times are cumulated, 
one will save a substantial amount of time over the transmission. 

— ^rebuild'- time of rebuilding the full trace starting from the effective trace 
(computation of the current parameters starting from the emitted attributes). 
This time must be compared with T ex tract, the computation time of the at- 
tributes. It is considered that such a time, even if cumulated with T ex tract, 
equilibrates most favorably with the cost of trace emission. 

— Te X ec- execution time of the proper functions of the analyzer. With very 
sophisticated analyzers (as those used for data analysis for example), this 
time can become so important that it makes negligible the one corresponding 
to the trace production. 

Example: Codeine, described in [13J , implements the generation of a full trace 
for the analysis of constraints resolution, as an extension of GNU-Prolog. The 



current state S t contains (among other attributes): proof-tree, search-tree, con- 
straints state and variables with their domains. For a complete description of 
the full trace see pQ (called in this project "generic trace"). Even if Codeine does 
not generate a full trace (as full as it could be possible), the Codeine trace is 
considerably richer than the simple Byrd's trace of GNU-Prolog which is strictly 
contained. Only an incremental trace is generated, and only the proof-tree and 
the constraints and variables states can be rebuilt later starting just from the 
current state of the process (the cost of extraction is reduced to the cost of the 
proper data management realized by the tracer). To obtain the current search- 
tree the process would have to be re-executed partially, therefore it would be 
necessary to freeze its current execution. The cost of management of a perma- 
nently accessible search-tree at any moment would be clearly intractable because 
of its size. On the other hand an analyzer can, using the produced incremental 
trace, maintain all these objects permanently (obtaining St or those parameters 
useful for it). Codeine also contains a tracer driver such that the definitions of 
the traces to be emitted (specification of the emitted trace) are stored in a data 
file which must be provided before the process starts. 

Times are distributed as follows: 

From the side of the process, tracer and driver: 

— T prog : execution time of GNU-Prolog with "switches" of the tracer (a small 
part of T core of negligible duration in general) . 

— T core : time of construction of the parameters of the full Codeine trace (col- 
lecting of all data useful to extract the full trace); a kind of GNU-Prolog 
plug-in. 

— T con d: time of filtering the full trace to select all the requested traces. 

— T extr act- computing time of the attributes corresponding to the parameters 
requested during filtering. 

— 2 encode- and- com- computing time of the attributes corresponding to the re- 
quested parameters, encoding (XML format or Prolog term) of the emitted 
trace, and emission time of the incremental trace. 

From the side of the analyzers: experiments in the framework of the OAD- 
ymPPaC project with sophisticated analyzers (intensive visualization of graphs, 
visual data analyses) revealed the following costs. 

— T futer and Tdecode'- both times are intricate in a syntactic analysis module 
(XML) of the full trace. Re-filtering, a part of which could have been avoided. 
But the "driven tracer" approach could not be taken into account in the 
analyzers built during this project. 

— Trebuiid- time of construction of the parameters of the full trace (variables 
domains, active constraints set, search-tree ...). This time grows (non esti- 
mated factor) according to the size of the data, sometimes related to the size 
of the trace, with a considerable slow down of the analyzer during the "read- 
ing" of the trace . The low speed of an analyzer may cause, in the case of 
analysis of the trace "on the fly", a strong slow down of the observed process. 



- T exec : time of construction of the objects to be visualized (graphs, data 
tables). These times can grow exponentially according to the size of the data. 
The efficiency of the used algorithms is crucial here. This can also cause a 
slow down of the process, and pleads in favor of a preliminary treatment 
of information before transmission (for example, selection of distinguished 
nodes to put in the trace to reduce the size of a drawn graphs, or to collect 
groups of variables as a unique attribute to reduce the number of lines in a 
matrix) . 

It was been shown experimentally in [3] that the behavior of the Codeine 
tracer with the full trace above compares favorably with the behavior of GNU- 
Prolog with the Byrd's trace only. Furthermore, the filtering realized by the 
tracer driver does not prejudices the performances. 

We give here a theoretical justification to this result, showing that this ap- 
proach, already justified experimentally, can also be justified theoretically. 

In [2] L. Langevine observes that filtering the traces all together is more 
effective than to filter them one after the other. This comes from the fact that 
running several automata together may be more efficient than running one only. 
Indeed the essence of filtering relates to simplified conditions whose role is to 
select first the trace events containing the ports requested by an analyzer. A 
finer additional filtering will be thus carried out thereafter but on a number 
of events much more reduced. One can admit that this first filtering relates to 
a trace whose language corresponds to a regular language. It is then possible 
to consider that each filter is itself a regular expression whose recognition on 
the full trace can be done using a non deterministic finite-state automaton. 
Filtering corresponds then to the recognition task by a union of as many finite- 
state automata than there are active analyzers requesting a trace. However the 
resulting automaton, once optimized, can be much more efficient than the most 
efficient of the automata associated with a single analyzer (the union of the 
automata can be more efficient in terms of computation steps than only one of 
them [IE]). As these operations of filtering are extremely frequent, because they 
apply to all the trace events, the speed up can be considerable. 

We are now in position to analyze the respective workloads. 

First observe that respective times of both sides may be considered as cu- 
mulative or not, depending whether the respective process are run sequentially 
or with true parallelism. In the later case, if all process are run on different pro- 
cessors (a situation which can be considered with such an approach), only the 
slowest process must be taken into account to evaluate the execution time of the 
whole system. 

Jprou and T core on one side, T exec on the other side. 

These times correspond to times specific to the tracer and the analyzers (the 
slowest analyzer has the main influence on the execution time) . These times 
are incompressible and T core depends only on the size of the virtual trace. 

— T con d on one side and Tfut er on the other side. 



At least one of these times must be null, and, if the filtering is at the source, 
the performance can be improved. In any cases this time appears negligible, 
whatever is the size of the full trace, compared with the other times. 
— T extra ct on one side and T re b u ud on the other side. 

These times correspond respectively to the computation of the attributes 
from the parameters and reciprocally. If one can reduce the extraction time 
thanks to the filtering (considering the low number of selected trace events 
and attributes, and the fact that the most of the parameters computation 
work is already included in T core ), the time of "rebuilding" can be very im- 
portant and will probably be significantly greater than the extraction time. 
Tencode a nd c om On One side and T decode on the other side. 
These times can be important, but the interest lies in the fact that the 
profits realized on the communication time (reduced emitted volume) largely 
compensates the coding/decoding times. 

To summarize, the tracer and the slowest analyzer are the main factors in- 
fluencing the whole performance and they can increase the workload of both 
sides considerably. However the use of a tracer driver and of techniques of "trace 
compression" make it possible to compensate partially, but sometimes in very 
effective manner, the over-costs related to the use of a full trace. 

5 Conclusion 

We introduced the concept of full trace in order to take into account the multiple 
possibilities of analysis of a dynamic process. The analyzed process is instru- 
mented with, in addition to its tracer, a trace driver, and the analyzers have 
the possibility of addressing orders to the driver. An architecture client /server is 
then considered to describe the interactions between the processes analyzers and 
the analyzed process. This enabled us to tackle the problem of the evaluation of 
this approach in terms of efficiency. 

We tried to appreciate how the introduction of a tracer driver could improve 
the global efficiency of the analysis of a dynamic process using several analyzers 
observing the process in a simultaneous way. We initially observed that a full 
trace could be particularly expensive to produce, but that part of this cost 
could be transferred, without loss of capacity of analysis (the full trace can be 
retrieved by the analyzers), on the analyzers. The most unfavorable case in term 
of efficiency corresponds to the situation where the equivalent of a full trace must 
be built, extracted and emitted. In this case, the full trace is in fact a union of 
all the reduced traces requested by several analyzers. We showed that, even in 
this case, a filtering realized by a tracer driver was able to bring an important 
benefice. Our observations, during the projects DiSCiPl [5] and OADymPPaC 
[I], using very sophisticated analyzers as powerful tools for visualizations as well 
([19J |20J), also showed that the limits of performance came often more of the 
analyzers that tracer, even with a full trace. 

This study opens finally on a series of questions: 



— How deep is it possible to implement a very broad full trace? Of 

course the limits of the approach are obvious: the concept of full trace is 
meaningful only with regards to a family of possible analyses, well known 
and selected in advance. Nothing guarantees a priori that an additional in- 
strumentation of the observed process will never be necessary. But beyond 
this aspect, the interesting question relates to the feasibility of the imple- 
mentation of a very fine grained full trace. On one side indeed, one will be 
able to compensate for certain production costs of such large trace, by emit- 
ting an effective trace of limited size; but on the other side, the analyzers 
(whose use can be temporary or exceptional) will be loaded with most of 
time of development /re-building of the full trace (necessary, even if it uses 
only a part of it) and will occasionally slow down the observed process. 
Which interaction language to use and which dialogue between 
the driver and the analyzers? This question has been little tackled 
here and mainly remains out of the scope of this article. The tendency how- 
ever is with the use of a language like XML. It is the way chosen by the 
OADymPPaC project, and the possibility of reducing the flow of trace to its 
bare minimum encourages this. Nevertheless our experiments showed that 
the communication needed to be optimized by combining usual methods of 
data compression with specific trace compression like using "incremental at- 
tributes", but also by introducing more abstract attributes in the trace. This 
leads to the idea that the dialogue between the involved processes, limited 
here to the choice of the trace events and the attributes in the trace, and 
capacities of synchronization, must be extended such that it allows also to 
influence the design of the attributes themselves. 

— Finally a crucial question related to the comprehension of the trace. How 
to understand a trace, or how to describe its semantics ? If the 
semantics of the trace (and of the tracer), or at least a large part of it, is 
given a priori (because one has a model for the observed process), then the 
comprehension of the trace as well as the implementation of the tracer are 
largely facilitated. In the opposite case - and it is the case for many natural 
or artificial complex processes - one has only vast traces to study. Even in 
the fields of programming languages where semantics seems better controlled 
a priori, one tries to analyze a program behavior by trying to understand its 
traces. Thus one sees stinging the usefulness of general techniques based on 
data mining [21 J or on Web mining [22J for such purposes. However one must 
recognize that any full trace will probably always include portions escaping 
any kind of description based on formal semantics. 
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